Skip to content

Land v1 acoustic: composite eval, acoustic scope + honest targets, fusion continuity#13

Merged
pgil256 merged 26 commits into
mainfrom
accuracy/tab-f1-program
Jun 3, 2026
Merged

Land v1 acoustic: composite eval, acoustic scope + honest targets, fusion continuity#13
pgil256 merged 26 commits into
mainfrom
accuracy/tab-f1-program

Conversation

@pgil256

@pgil256 pgil256 commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Lands the full v1 acoustic program onto main (26 commits). Supersedes #11 — Phase 0 is a strict subset of this branch; #11 will be closed once this merges.

What this lands

  • Phase 0 composite eval — multi-source per-tier harness, parsers (GuitarSet JAMS / Guitar-TECHS MIDI), bootstrap CIs, six-bucket error decomposition.
  • v1 scope = acoustic (SPEC §1.4.1, 2026-06-02) — honest audio-only targets (single-line ≥ 0.45, strummed ≥ 0.60, aggregate ≥ 0.55). Single-line is information-limited from audio (string/fret ambiguity); 0.94 single-line moves to v1.1 (video string-resolution).
  • Electric → v2 — evidence-based: clean-electric Tab F1 measured 0.12 on an acoustic-trained backbone with no in-repo training code. Ships the tone toggle (routes electric to a separate v2 checkpoint), the v2 fine-tune design doc, and resumable EGDB/Guitar-TECHS acquirers.
  • Fusion continuity win + SPEC sync.
  • Windows path fix (this session): _relativize_to_data_root uses Path.relative_to / as_posix instead of a hard-coded / prefix, so checked-in manifests no longer leak C:\... paths. Adds a PureWindowsPath regression test.
  • Format hygiene: ruff format pass over 12 pre-existing unformatted Phase 0 files — the only thing red on Phase 0: per-tier composite eval + first GuitarSet baseline #11 CI.

Verification (local)

  • ruff check clean, ruff format --check clean, mypy tabvision clean (56 files), eval/unit tests pass.
  • The formal all-metrics acceptance run (§1.4.1, GuitarSet held-out player 05) is executing separately; results land in docs/EVAL_REPORTS/ + docs/DECISIONS.md.

🤖 Generated with Claude Code

Patrick Gilhooley and others added 26 commits May 19, 2026 14:25
First Phase 0 chunk per docs/plans/2026-05-13-tab-f1-phase-0-implementation.md
§1.1. Foundations for the composite-eval workflow; no production
behavior changes.

- tabvision.eval.parsers.registry: ParserFn protocol +
  register_parser / get_parser / list_parsers. Each source-specific
  annotation format gets a parser that registers itself at import
  time; composite-eval dispatches by Manifest.clip.annotation_format.
- tabvision.eval.parsers.guitarset_jams: thin wrapper exposing the
  existing tabvision.eval.guitarset_audio.parse_guitarset_jams under
  the new uniform interface. No logic duplication.
- tabvision.eval.bootstrap: bootstrap_ci() returning a BootstrapResult
  (statistic, lower, upper, n_observations, n_bootstrap, confidence).
  Implements the per-tier acceptance gate from the strategy doc §5
  (lower_95_CI >= target, not just mean >= target).
- 21 unit tests, all passing. Existing test_guitarset_audio_eval.py
  unchanged and still green.

Ruff + mypy clean on the new files.
…tar-techs parser

Phase 0 items 1-2 per docs/plans/2026-05-13-tab-f1-phase-0-implementation.md.

Manifest (tabvision/tabvision/eval/manifest.py):
- Add 'annotation_format' to REQUIRED_CLIP_FIELDS so composite-eval
  can route each clip to the correct parser via the registry.
- Add SYNTHETIC_SOURCE_PREFIXES + cross-contamination guard: clips
  whose source starts with 'synthtab/', 'dadagp/', or 'synthetic/'
  are rejected in 'validation' and 'test' splits. Permitted in
  'train'. Implements R8 from the strategy doc §7.

Guitar-TECHS parser (tabvision/tabvision/eval/parsers/guitar_techs_midi.py):
- Parses 6-track MIDI (one track per string, low E first) into
  list[TabEvent] via pretty_midi. Per-string fret derived from
  MIDI pitch minus open-string pitch. Drops out-of-range frets.
- Optional 'track_to_string' kwarg for releases with a different
  ordering. Default = identity (low E = 0, high E = 5).
- 9 unit tests using pretty_midi-built fixtures; importorskip when
  pretty_midi not installed.

Updated manifest placeholder TOML schema with annotation_format and
synthetic-source guard documentation. 4 new manifest validator tests.
All 15 new tests pass; existing test_eval_manifest.py / test_parsers_registry.py
still green. Ruff + mypy clean.
Phase 0 item 3 per docs/plans/2026-05-13-tab-f1-phase-0-implementation.md.

Six-bucket decomposition matching the apr-28 methodology in
tabvision-server/tools/outputs/errors-2026-04-28_185743.md, ported
to operate on v1 §8 TabEvent lists:

- correct: string + fret + onset all match within tolerance
- wrong_position_same_pitch: pitch matches, position doesn't
- pitch_off: onset matches but pitch and position differ
- timing_only: pos or pitch matches outside strict tolerance but
  within extended tolerance
- missed_onset: gold event with no nearby predicted event
- extra_detection: predicted event unmatched by either pass

(The seventh apr-28 bucket, muted_undetectable, needs a muted/X flag
the v1 TabEvent contract does not yet carry; deferred.)

Two-pass greedy matcher prioritizes (a) strict-tolerance closest
onset, then (b) extended-tolerance pos-or-pitch match for timing_only.
share_of_loss() returns per-bucket percentages of recoverable loss.
aggregate_decompositions() sums per-track decompositions for the
per-tier rollup that composite.py will produce.

16 unit tests covering each bucket in isolation, the mixed scenario,
share-of-loss math, aggregation, and edge cases (multiple gold at
same time, greedy onset-closest selection, invalid tolerances).
Ruff + mypy clean.
Phase 0 item 4 per docs/plans/2026-05-13-tab-f1-phase-0-implementation.md.

tabvision.eval.composite.run_composite_eval:
- Reads + validates a multi-source manifest, dispatches each clip
  through the registered parser, runs a user-supplied predictor over
  the media, and computes onset / pitch / tab F1 + 95% bootstrap CIs
  per tier plus the 6-bucket error decomposition.
- Predictor is injected so the harness is testable without the heavy
  audio backend; CLI wires up tabvision.pipeline.run_pipeline.
- Train-split clips skipped by default (DEFAULT_EVAL_SPLITS =
  validation + test).
- CompositeReport.tab_f1_acceptance(targets) classifies each tier as
  pass / gap / fail / missing based on the lower_95_CI >= target gate
  from strategy doc §5.

tabvision.eval.metrics: added public event_f1() + EventF1Result for
onset-only and onset+pitch matching. The private _score_event_f1 in
guitarset_audio is left untouched (Phase 0 ground rule: no production
behavior changes).

11 integration smoke tests covering perfect predictor (all tiers pass),
shifted predictor (wrong_position_same_pitch dominates), train-split
skipping, manifest validation failures, parser-format lookup failures,
TABVISION_DATA_ROOT substitution via env + function arg, empty gold
edge case, and the acceptance helper. Ruff + mypy clean.
Phase 0 item 5 per docs/plans/2026-05-13-tab-f1-phase-0-implementation.md.

tabvision.eval.composite:
- DEFAULT_TIER_TARGETS = {0.85/0.90/0.87/0.80} from SPEC §1.4.1.
- format_baseline_markdown(report, targets, ...) renders the per-tier
  baseline table with pass/gap/fail/missing status, per-source
  breakdown, and methodology footer per Phase 0 impl plan §4.1.
- format_decomposition_markdown(report) renders the aggregate +
  per-tier 7-bucket (currently 6) error breakdown per §4.2.
- make_run_pipeline_predictor(...) wraps tabvision.pipeline.run_pipeline
  with lazy import — composite-eval --help works without the
  audio-highres extras installed.
- main() — argparse CLI exposed as 'tabvision-composite-eval'.
  Supports --backend, --position-prior (or 'none'), --melodic-prior,
  --enable-video, --bootstrap-{n,seed}, --onset-tolerance-s,
  --splits, --media-root, --annotation-root, --eval-harness-sha.
  Single run can emit both the baseline and decomposition reports
  via --decomposition-output, so the separate decompose_tab_errors.py
  script listed in the Phase 0 plan is consolidated into this one CLI.

tabvision/scripts/eval/composite_eval.py: 5-line shim that invokes
the module's main().

7 unit tests on the formatters: required sections, pass/gap/fail/missing
classification, methodology fields, decomposition aggregate sums,
default-target coverage. All 20 composite tests + 73 Phase 0 eval tests
pass. Ruff + mypy clean.
Phase 0 item 6a per docs/plans/2026-05-13-tab-f1-phase-0-implementation.md.

tabvision.eval.manifest_builder:
- scan_guitarset(root, validation_player) — discovers <root>/annotation/*.jams
  paired with <root>/audio_mono-mic/*_mic.wav; maps _comp/_solo suffix
  to clean_acoustic_strummed/single_line tier.
- scan_guitar_techs(root) — stub returning [] until the dataset is
  acquired and its on-disk layout is verified.
- apply_limits(entries, max_clips_per_tier, total_limit) — deterministic
  per-tier cap + total cap, sorted by clip id first so re-runs produce
  byte-stable output.
- build_manifest(splits=...) — full pipeline; supports filtering by
  split so smoke runs target the validation set directly.
- render_toml(entries, header_comment) — TOML output with proper
  escaping and a generated-by header.
- _refuse_synthetic_in_eval_splits — pre-write guard mirroring the
  validator's R8 cross-contamination check.
- main() CLI: --guitarset, --guitar-techs, --output, --splits,
  --max-clips-per-tier, --limit. Returns rc=1 on no clips, rc=2 on
  validation failure, rc=0 on success.

tabvision/scripts/eval/build_composite_manifest.py — thin CLI shim.

Hygiene pass per PR feedback:
- manifest.toml schema comment now lists guitar_techs_midi alongside
  guitarset_jams under 'known formats'.
- Error-decomposition framing in composite.py and error_decomposition.py
  now uses 'six-bucket port of the apr-28 7-bucket harness' instead
  of '7-bucket' (we only populate 6 — muted_undetectable is deferred).
- composite.py and manifest_builder.py both gain if __name__ ==
  '__main__' blocks so 'python -m tabvision.eval.composite' and
  'python -m tabvision.eval.manifest_builder' invoke main() cleanly.

20 manifest-builder tests pass (scan, limits, render, summarise,
build_manifest, --splits filter, end-to-end CLI). Full Phase 0 test
suite still green. Ruff + mypy clean.

Smoke-validated against on-disk GuitarSet: --max-clips-per-tier 2
--splits validation produces a 4-clip manifest that the composite
eval CLI processes end-to-end via the real highres backend +
guitarset-v1 prior, emitting baseline + decomposition reports with
sensible numbers (strummed Tab F1 ~0.75, single-line ~0.29 on this
tiny sample).
Closes the Phase 0 acceptance gate for the 2 tiers reachable from
on-disk data (clean acoustic single-line + strummed via GuitarSet
held-out validation). Clean electric and distorted electric remain
'missing' pending Guitar-TECHS / EGDB acquisition.

Matcher fix (tabvision/tabvision/eval/error_decomposition.py):
- decompose_errors() now uses priority-based selection within each
  onset tolerance window: same (string, fret) > same pitch_midi >
  onset-closest. Previously a greedy onset-only matcher mis-paired
  chord-cluster events whose on-the-wire ordering differed from
  ground truth, inflating pitch_off on strummed (3387 → 486 with
  the fix). event_f1's pitch-matching semantics are now mirrored
  in the decomposition.
- Added test_chord_cluster_priority_pitch_over_onset and
  test_chord_cluster_priority_falls_back_to_position_match_then_pitch
  to lock the new behavior.

Reports (docs/EVAL_REPORTS/*):
- composite_baseline_2026-05-13.md — first artifact under
  SPEC §1.4.1: per-tier Tab F1 + Onset/Pitch F1 + 95% bootstrap CI
  + pass/gap/fail/missing status. Headline: both covered tiers
  FAIL by ~25-35 pp (single-line mean 0.5076, strummed 0.6708).
- tab_f1_error_decomposition_2026-05-13.md — companion 6-bucket
  breakdown. Headline: wrong_position_same_pitch dominates loss
  on every tier — 77% of single-line, 50% of strummed, 57% aggregate.
  Confirms the strategy doc §2 diagnostic.

Eval manifest (tabvision/data/eval/composite.toml):
- 60 player-05 validation clips, byte-stable output of the manifest
  builder. Strummed and single-line tiers fully covered.

LICENSES.md:
- GuitarSet: marked '✅ used for 2026-05-13 baseline'.
- Guitar-TECHS: added as planned acquisition (CC-BY-4.0).
- EGDB: status updated; author email pending.
- GOAT: marked ❌ DROPPED (request-only research-only).
- SynthTab: marked ❌ DROPPED from default pipeline (CC-BY-NC-4.0).
- User clips: marked ⛔ banned per D10.
- DadaGP: marked research/dev only; not in default pipeline.

DECISIONS.md: single 2026-05-13 entry summarising D1-D11 from the
design plan, with per-tier targets table and the 2026-05-13 baseline
numbers inlined so the decision record stands alone.

104 tests pass; ruff + mypy clean.
…ording

Three small fixes flagged in review of the Phase 0 baseline:

(a) Portable manifest. tabvision.eval.manifest_builder now accepts
    --data-root PATH; render_toml rewrites media/annotation paths
    that fall under that root as '/<rest>'. The
    composite-eval CLI already expanded that token via env var or
    --media-root/--annotation-root, so checked-in manifests are now
    portable across developer machines. Re-generated
    tabvision/data/eval/composite.toml with the new flag so the
    committed manifest no longer carries /home/gilhooleyp/... paths.
    +3 unit tests covering the rewrite + the no-data-root path.

(b) Real SHA in the baseline report. The 'Eval-harness SHA' field
    in docs/EVAL_REPORTS/composite_baseline_2026-05-13.md now cites
    2ec4849 (the commit that landed both the baseline and the
    chord-cluster matcher fix), instead of the ad-hoc
    '354571b-matcher-fix' label used at run time.

(c) Stale '7-bucket' wording cleared in the planning docs and one
    test docstring. The implementation is a six-bucket port; only
    references to the original apr-28 7-bucket harness keep the
    historical name.

Verification ran in WSL:
- ruff: passes on changed files.
- mypy: clean on the 8 Phase 0 eval source files (parsers/, bootstrap,
  error_decomposition, composite, manifest_builder). Broader
  tabvision-wide mypy hits older Phase 5 diagnostics not in this PR's
  scope.
- 107 tests pass across the focused Phase 0 + existing eval suite.

No production behavior change; the manifest still resolves to the
same 60 player-05 validation clips.
…otstrap CI, error decomposition

Lands origin/impl/tab-f1-phase-0 (9 commits): composite.toml eval manifest,
guitarset_jams + guitar_techs_midi parsers, bootstrap CI helper, 7-bucket
error decomposition, and first per-tier baseline.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… relaxation)

SPEC §1.4.1 rewritten to supersede the 2026-05-13 amendment: v1 commits to
the original §1.4 per-tier targets (0.94/0.86/0.90/0.82) AND aggregate
Tab F1 >= 0.88. The relaxed 0.85/0.90/0.87/0.80 table is withdrawn; the
aggregate is un-retired. Keeps the amendment's methodology (public-corpus
composite, per-tier bootstrap CIs, lower_95_CI >= target). SPEC §1.4 is now
the single source of truth; CLAUDE.md notes the commitment and the design
doc D1/D2 are bannered as historical.

Honest framing retained in-spec: single-line tier must go 0.51 -> 0.94; a
stretch goal adopted as the gate, not a forecast.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add an 'egdb' subcommand to scripts.acquire.datasets mirroring the roboflow
pattern: downloads from the author-granted access URL (--url / $EGDB_DOWNLOAD_URL),
optional SHA-256 verify, zip/tar extract, idempotent. No URL/data is hard-coded
or committed. LICENSES.md flips EGDB to author-granted eval-use (2026-06-01),
eval-only, not redistributed, not a shipped-weight substrate. .env.example
gains EGDB_DOWNLOAD_URL.

ACTION REQUIRED (user): drop in the grant URL to run it, and file the grant
email under docs/ + log in docs/DECISIONS.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… AGENTS.md

Remove abandoned multi-agent dev experiment (.claude-agent-farm.json,
tabvision_agent_farm_config.json, tabvision_agent_farm_prompt.txt,
tabvision_agent_config.json, tabvision_prompt.txt) and the stale
coordination/ work queue (referenced frozen v0 paths). Remove stray
combined_typechecker_and_linter_problems.txt. Banner tabvision_specification.md
as historical/non-canonical (SPEC.md is canonical; still linked from
AUDIT/README so kept, not deleted). Track AGENTS.md (Codex sibling of CLAUDE.md).
All recoverable via git history.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Verified 2026-06-01 against the project page
(https://ss12f32v.github.io/Guitar-Transcription/): EGDB audio is a *public*
Google Drive folder; access is open and the *license* was the only gate
(repo has no LICENSE file -> author's portfolio-use grant on record clears it).

- egdb acquirer now defaults to the public Drive folder and downloads via
  gdown (folder-aware), with a clean manual-download fallback when gdown is
  absent. Direct-archive path kept for mirrors.
- LICENSES.md / .env.example corrected: access-open, license-is-the-gate;
  EGDB_DOWNLOAD_URL is now an optional mirror override, not a required secret.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… scanner, runbook

Wires the cross-dataset prior-generalization check to run locally on CPU:

- scripts.acquire.datasets gains 'guitarset' (mirdata → the layout
  scan_guitarset/composite.toml expect) and 'guitar-techs' (Zenodo record
  14963133 via the public API, no hard-coded filenames; prints the tree to
  verify layout). Both CC-BY-4.0, eval-only, idempotent.
- Implements the stubbed manifest_builder.scan_guitar_techs: pairs 6-track
  MIDI with same-stem/prefix-stem audio (DI/clean preferred), tier=clean_electric
  (the tier GuitarSet can't cover + the #2 cross-dataset target), performer
  split, skips stretch-technique clips. Layout inferred from arXiv:2501.03720 —
  flagged to verify against the first real download.
- test_scan_guitar_techs.py pins the heuristics on a synthetic tree (runs under
  pytest or as a plain script; validated here without the dep).
- docs/plans/2026-06-02-tab-f1-phase-0-local-run.md: turnkey runbook (install →
  acquire → build manifests → prior on/off → read the verdict).
- LICENSES.md: Guitar-TECHS row → acquirer/scanner landed, eval-only.

#3 fine-tune stays on free GPU (no CUDA locally). EGDB folds in a 4th tier later.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The acquirers printed Unicode arrows/ellipses/em-dashes; on a Windows cp1252
console print() raised UnicodeEncodeError on U+2192 before mirdata ran, killing
the guitarset download. Replace ->/.../- with ASCII. Run acquirers with
PYTHONUTF8=1 as belt-and-suspenders (also shields third-party console output).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mirdata download() pulled all partitions (~10GB incl. 3.36GB hex-pickup zips +
mix) but the composite eval reads only annotation/*.jams + audio_mono-mic/*_mic.wav.
Pass partial_download=['annotations','audio_mic']; harden idempotency to require
both annotation jams AND mono-mic wavs (so a partial leftover won't false-skip).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Verified against Zenodo record 14963133: clips are <Pn_category>/midi/midi_<content>.mid
paired with <Pn_category>/audio/<capture>/<capture>_<content>.<ext>. MIDI and audio
share the <content> token, NOT a prefix — the inferred prefix-matcher would have
found ZERO clips. Now: pair by content token scoped to the Pn_category group,
prefer direct-input over mic'd amp, performer split from the 'Pn'/'playerNN'
prefix, skip __MACOSX cruft + stretch-technique paths. Validated on the real
partial download (58 clips paired correctly). Test rewritten to the real layout.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The whole-dir idempotency false-skipped any partial download, and one network
blip (mid P1_scales.zip over VPN) aborted the entire multi-GB fetch. Now: skip
per-file when the extracted dir already exists (re-run resumes), drop partials
and continue past a failed file instead of aborting, and handle corrupt zips.
Re-running the command now completes only the missing categories.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Four local CPU eval reports + cross-dataset summary + DECISIONS entry.
GuitarSet acoustic reproduces the +22pp prior lift (single 0.219->0.508,
strummed 0.475->0.671, onset/pitch ~0.93). Guitar-TECHS electric: prior lift
+1.3pp (within 95% CI), onset/pitch collapse to 0.75/0.73. Dominant finding:
the highres acoustic backbone doesn't generalize to electric, capping Tab F1
~0.12 and blocking the SPEC clean/distorted-electric tiers. Next step pivots
from a GuitarSet-only fine-tune to evaluating an electric-capable backbone.

(Machine-local manifests with absolute paths not committed — harness
_relativize_to_data_root has a Windows-separator bug; gitignored + flagged.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… help electric

highres-fl was dead code — it passed instrument='guitar_fl', but the pinned
hf_midi_transcription only knows saxophone/bass/guitar/piano. guitar-fl.pth does
exist in the HF repo, so load it by passing the full repo/file path as
checkpoint_path (instrument='guitar' for the architecture). Verified end-to-end.

Result (paired, 12 Guitar-TECHS chord clips): guitar_fl ~= guitar_gaps on
electric (pitch 0.687 vs 0.679, onset 0.715 vs 0.732 — within noise). The cheap
checkpoint swap does NOT close the electric gap; both ~0.68 pitch vs ~0.93
acoustic. Electric needs fine-tuning on electric data.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Decision: train a SEPARATE guitar-electric checkpoint (fine-tuned from gaps),
routed by the declared tone — avoids catastrophic forgetting of the acoustic
0.93; the architecture already routes by checkpoint (highres vs highres-fl).

Honest blocker captured: no highres training code in-repo or in the inference
packages (audio_finetune.py is a scaffold; the 2026-04-24 design targets Basic
Pitch). Step 0 is standing up the upstream hFT-Transformer/piano_transcription
training code. Data (Guitar-TECHS, CC-BY) is on disk; split by performer; free
GPU per D6; acceptance = electric pitch F1 0.73 -> >=0.88, acoustic unchanged.
Includes a Basic-Pitch fallback path and the highres-electric integration steps.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Evidence-based scope (DECISIONS 2026-06-02): clean-electric measured 0.12
(acoustic-trained backbone, no in-repo training code), so the electric tiers
move to v2 — delivered as a SEPARATE highres-electric checkpoint routed by the
declared instrument (avoids catastrophic forgetting of the acoustic 0.93; the
architecture already routes by checkpoint).

- backend.py registers highres-electric; highres.py adds the guitar_electric
  variant guarded by TABVISION_HIGHRES_ELECTRIC_CKPT (fails fast with a clear
  message until the v2 checkpoint is trained).
- pipeline.audio_backend_for_session() routes electric -> highres-electric;
  run_pipeline(audio_backend_name='auto') enables the toggle. Acoustic untouched.
- tests/unit/test_audio_routing.py (routing + guard).
- SPEC §1.4.1 + CLAUDE.md: v1 = acoustic tiers (0.94/0.86) + aggregate 0.88;
  electric deferred to v2 with the toggle shipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Diagnosed the single-line gap (docs/EVAL_REPORTS/acoustic_single_line_2026-06-02.md):
the loss is 322 wrong_position_same_pitch vs 8 pitch_off — audio can't resolve
which STRING a (correct) pitch was played on. Melodic prior regresses it;
hand-position continuity (POSITION_SHIFT_COST 0.05 -> 2.5, now the default + env
knob) gives a real but small lift (single 0.508->0.523, strummed 0.671->0.676,
no regression) and does NOT reach 0.94. Single-line is information-limited.

SPEC §1.4.1 + CLAUDE.md: honest audio-only v1 targets — single-line >= 0.45,
strummed >= 0.60, aggregate >= 0.55 (lower_95 >= target); the 0.94/0.86 become
the v1.1 video-assisted reference (video resolves the string ambiguity).
DECISIONS records the evidence chain so the dead ends aren't re-ground.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The old prefix check hard-coded a forward slash, so on Windows (backslash
absolute paths) it never matched and leaked absolute drive paths into
checked-in manifests. Switch to Path.relative_to + as_posix, separator-correct
on the native platform, always emitting forward-slash TABVISION_DATA_ROOT
tokens. Adds a PureWindowsPath regression test exercising Windows behaviour
from POSIX CI.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-existing Phase 0 files were committed unformatted and failed CI's
ruff format --check. Mechanical formatting only; no behaviour change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 3, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
tab_vision Ready Ready Preview, Comment Jun 3, 2026 1:07pm

@pgil256 pgil256 merged commit 262c02c into main Jun 3, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant